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In this paper, we study games with continuous action spaces and non-linear payoff functions. Our key insight 
is that Lipschitz continuity of the payoff function allows us to provide algorithms for finding approximate 
equilibria in these games. We begin by studying Lipschitz games, which encompass, for example, all con¬ 
cave games with Lipschitz continuous payoff functions. We provide an efficient algorithm for computing 
approximate equilibria in these games. Then we turn our attention to penalty games, which encompass bi¬ 
ased games and games in which players take risk into account. Here we show that if the penalty function 
is Lipschitz continuous, then we can provide a quasi-polynomial time approximation scheme. Finally, we 
study distance biased games, where we present simple strongly polynomial time algorithms for finding best 
responses in Li, L|, and Loo biased games, and then use these algorithms to provide strongly polynomial 
algorithms that find 2/3, 5/7, and 2/3 approximations for these norms, respectively. 
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1. INTRODUCTION 

The Nash equilibrium lNashllT9^ is the central solution concept that is studied in 
game theory. However, recen t advances have s hown that computing an exact Nash 
equilibrium is PPAD-complete IlChen et al.ll2009l:lDaskalakis et al.ll2009f] . and so there 
are unlikely to be polynomial time algorithms for this problem. The hardness of com¬ 
puting exact equilibria has lead to the study of approximate equilibria: while an exact 
equilibrium requires that all players have no incentive to deviate from their current 
strategy, an e-approximate equilibrium requires only that their incentive to deviate is 
less than e. 

A fruitful line of work has developed studying the best approximations that can 
be found in polynomial-time for bimatrix g ames, which are two-player strategic form 
game s. There, after a number of papers llBosse et al.l l2 0ld; iD askalakis et aL 12007I. 
12009[] . the best known algorithm was given bv [Tsaknakis and Snirakisl |2008l] . who 
provide a polynomial time algorithm that finds a 0.3393-equihbrium. A prominent 
open problem is whethe r there exists a PT AS for this problem. The existence of an 
FPTAS was ruled out bv IChen et al.l 120091] unless PPAD = P. While the existence of a 
P TAS remains open, t here is however a quasi-polynomial approximation scheme given 
bv lLinton et al.l tf2003l] . 

In a strategic form game, the game is specified by giving each player a finite number 
of strategies, and then specifying a table of payoffs that contains one entry for every 
possible combination of strategies that the players might pick. The players are allowed 
to use mixed strategies, and so ultimately the payoff function is a convex combination 
of the payoffs given in the table. However, some games can only be modelled in a more 
general setting where the action spaces are continuous, or the payoff functions are 
non-linear. __ 

For example, Rosen’s seminal work llRosenlll965l] considered a more general setting 
of games, called concave games, where each player picks a vector from a convex set. The 
payoff to each player is specified by a function that satisfies the following condition: if 
every other player’s strategy is fixed, then the payoff to a player is a convex function 
over his strategy space. Rosen proved that conca ve games always have an equilibrium. 
A natural subclass of concave games, studied bv iCaragiannis et al.l Ii2014ll . is the class 






























of biased games. A biased game is defined by a strategic form game, a base strategy and 
a penalty function. The players play the strategic form game as normal, but they all 
suffer a penalty for deviating from their base strategy. This penalty can be a non-linear 
function, such as the norm. 

In this paper, we study the computation of approximate equilibria in such games. 
Our main observation is that Lipschitz continuity of the players’ payoff functions al¬ 
lows us to provide algorithms that find approximate equilibria. Several papers have 
studied how the Lipschitz continuity of the players’ payoff functions affects the ex- 
istence, the quality, and t he complexity of the equilibria of the underlying game. 
lAzrieli and Shmaval 120131] studied many player games and derived bounds for the 
Lipschitz constant of the utility functions for the pla yers that guarantees the exis¬ 
tence of pure approximate equilibrium for the game. jPaskalakis and Papadimitriouj 
120141] proved that anonymous games posses pure approximate equilibria whose qual¬ 
ity depends on the Lipschitz constant of the payoff functions and the number of pure 
strategies the players have and proved that this approximate equilibrium can be com¬ 
puted in polynomial time. Furthermore, they gave a polynomial-time approximation 
sche me for anonymous games with many players and constant number of pure strate¬ 
gies. iBabichenkol 12013]] presented a best-reply dynamic for n players Lipschitz anony¬ 
mous games with two str ategies which reac hes an approximate pure equilibrium in 
O(nlogn) steps. Recently. IChen et al.l 12015]] proved that it is PPAD-complete to com¬ 
pute an e-equilibrium in anonymous games w ith seven pure strateffl es, when e is expo¬ 
nentially small in the number of the players. I Deb and Kalai] 1201^ studied how some 
variants of the Lipschitz continuity of the utility functions are sufficient to guarantee 
hindsight stability of equilibria. 

1.1. Our contribution. 

Lipschitz games. We begin by studying a very general class of games, where each 
player’s strategy space is continuous, and represented by a convex set of vectors, and 
where the only restriction is that the payoff function is Lipschitz continuous. This 
class encompasses, for example, every concave game in which the payoffs are Lips¬ 
chitz continuous. This class is so general that exact equilibria, and even approximate 
equilibria may not exist. Nevertheless, we give an efficient algorithm that either out¬ 
puts an e-equilibrium, or determines that game has no exact equilibria. More pre¬ 
cisely, for M player games that are A-continuous in the Lp norm, for p>2, and where 
7 = max ||x||p over all x in the strategy space, we either compute an e-equilibrium or 

determine that no exact equilibrium exists in time O where k = 0( ^ ) 

and I = 0{ ^ ). Observe that this is a polynomial time algorithm when A, p, 7, M, 

and e are constant. 

To prove this result, we utilize a recent result of lBarm^ t2015]] . which states that 
for every vector in a convex set, there is another vector that is e close to the original 
in the Lp norm, and is a convex combination of b points on the convex hull, where 
b depends on p and e, but does not depend on the dimension. Using this result, and 
the Lipschitz continuity of the payoffs, allows us to reduce the task of finding an e- 
equilibrium to checking only a small number of strategy profiles , and thus w e get a 
brute-force algorithm that is reminiscent of the QPTAS given by iLinton et al.] f2003]] 
for bimatrix games. 

However, life is not so simple for us. Since we study a very general class of games, 
verifying whether a given strategy profile is an e-equilibrium is a non-trivial task. It 
requires us to compute a regret for each player, which is the difference between the 
player’s best response payoff and their actual payoff. Computing a best response in a 
bimatrix game is trivial, but for Lipschitz games, computing a best response may be a 































hard problem. We get around this problem by instead giving an algorithm to compute 
approximate best responses. Hence we find approximate regrets, and it turns out that 
this is sufficient for our algorithm to work. 

Penalty games. We then turn our attention to penalty games. In these games, the 
players play a strategic form game, and their utility is the payoff achieved in the game 
minus a penalty. The penalty function can be an arbitrary function that depends on 
the player’s strategy. This is a general class of games that encomp asses a number of 
games that have been studied before. The biased games studied bv iCaragiannis et al.l 
ISoIl . are penalty games where the penalty is determined by the amount that a 
player deviates from a specified base strategy. The biased model was studied in the 
past by ps ychologists i Tversky and Kahneman I 119741] and it is close to what they call 
anchoring i Chaxtvaan and Jo hnsonril999inKahnemanlll992f] . In their seminal paper. 
Fiat and Papadimitriouri2010|] introduced a model for risk prone games. This model 
rese mbles penalty games since the ris k component can be encoded in the penalty func¬ 
tion. iMavronicolas and MonienI l2015l] followed this line of research and provided re¬ 
sults on the complexity of deciding if such games possess an equilibrium. 

We again show that Lipschitz continuity helps us to find approximate equilibria. The 
only assumption that we make is that the penalty function is Lipschitz continuous in 
an Lp norm with p > 2. Again, this is a weak restriction, and it does not guarantee that 
exact equilibria exist. Even so, we give a quasi-polynomial time algorithm that either 
finds an e-equilibrium, or verifies that the game has no exact equil ibrium. _ 

Our result can be seen as a generalisation of the QPTAS given bv lLinton et al.l l2003f] 
for bimatrix games. Their approach is to show the existence of an approximate equi¬ 
librium with a logarithmic support. They proved this via the probabilistic method: if 
we know an exact equilibrium of a bimatrix game, then we can take logarithmically 
many samples from the strategies, and with positive probability playing the sampled 
strategies uniformly will be an approximate equilibrium. 

We take a similar approach, but since our g ames are more com plicated, our proof 
is necessarily more involved. In particular, for iLinton et all l2003f] . proving that the 
sampled strategies are an approximate equilibrium only requires showing that the 
expected payoff is close the payoff of a pure best response. In penalty games, best 
response strategies are not necessarily pure, and so the events that we must consider 
are more complex. 

Distance biased games. Finally, we consider distance biased games, which are a 
subclass of penalty games that have been studied recently bv iCaragiannis et al.ll2014l] . 
They showed that, under very mild assumptions on the bias function, biased games 
always have an exact equilibrium. Furthermore, for the case where the bias function 
is either the Li norm, or the L\ norm, they give an exponential time algorithm for 
finding an exact equilibrium. 

Our results for penalty games already give a QPTAS for biased games, but we are 
also interested in whether there are polynomial-time algorithms that can find non¬ 
trivial approximations. We give a positive answer to this question for games where the 
bi as is the Li norm, the Lj norm, or the Loo norm. We follow the well-known approach 
of iDaskalakis et ^ t2009t] . who gave a simple algorithm for finding a 0.5-approximate 
equilibrium in a bimatrix game. Their approach is as follows: start with an arbitrary 
strategy x for player 1, compute a best response j for player 2 against x, and then 
compute a best response i for player 1 against j. Player 1 mixes uniformly between x 
and i, while player 2 plays j. 

We show that this algorithm also works for biased games, although the generali¬ 
sation is not entirely trivial. Again, this is because best responses cannot be trivially 
computed in biased games. For the Li and Loo norms, best responses can be computed 
































via linear programming, and for the norm, best responses can be formulated as a 
quadratic program, and it turns out that this particular QP can be solved in polynomial 
time by the ellipsoid method. However, none of these algorithms are strongly polyno¬ 
mial. We show that, for each of the norms, best responses can be found by a simple 
strongly-polynomial combinatorial alg orithm. We then analyse the quality of approx¬ 
imation provided by the technique of iDaskalakis et ^ 120091] . We obtain a strongly 
polynomial algorithm for finding a 2/3 approximation in Li and L^o biased games, and 
a strongly polynomial algorithm for finding a 5/7 approximation in biased games. 
For the latter result, in the special case where the bias function is the inner product of 
the player’s strategy we find a 13/21 approximation. 

2. PRELIMINARIES 

We start by fixing some notation. For each positive integer n we use [n] to denote 
the set { 1 , 2 ,..., n}, we use A" to denote the {n — 1 )-dimensional simplex, and ||a;||p 

( \ 1/p 

SiG[d] ■ Griven a set 

X = {xi,X 2 , . •., Xn} C we use conv(X) to denote the convex hull of X. 

Games and strategies. A game with M-players can be described by a set of avail¬ 
able actions for each player and a utility function for each player that depends both on 
his chosen action and the actions the rest of the players chose. For each player i G [M] 
we use Si to denote his set of available actions and we call it strategy space. We will use 
Xi G Si to denote a specific action chosen by player i and we will call it as the strategy of 
player i. Furthermore, we use x = (cci,..., xm) to denote a strategy profile of the game. 
We use Ti(xi,x_i) to denote the utility of player i when he plays the strategy Xi and 
the rest of the players play according to the strategy profile x_i. A strategy Xi is a best 
response against the strategy profile x_i, if ri(xi,x_,;) > Ti(xi,x_i) for all Xi G Si. The 
regret player i suffers under a strategy profile x is the difference between the utility of 
his best response and his utility under x, i.e. Ti{xi, x-i) — Ti{xi, x_i). 

Ap-Lipschitz Games. We will use the notion of the Ap-Lipschitz continuity. 

Definition 2.1 {\p-Lipschitz). A function / ; A —^ R, with A C is Ap-Lipschitz 
continuous if for every x and y in A, it is true that |/(x) — f{y)\ < A • ||x — y\\p. 

We call the game £ := (M, n, A,p, 7 , T) Xp-Lipschitz if for each player i G [M] 

— the strategy space Si is the convex hull of n vectors yi, in R'^, 

— maxa;,g 5 , ||xi||p < 7 

— the utility function Ti{x) G T is Ap-Lipschitz continuous. 

Two Player Penalty Games. A two player penalty game V is defined by a tuple 
(i?, C, h(x), fc(y)), where (A, C) is a bimatrix game and {^(x) and fc(y) are the penalty 
functions for the row and the column player respectively. The utilities for the players 
under a strategy profile (x, y), denoted by Tr(x, y) and Tc(x, y), are given by 

Tr(x, y) = x^Ry - fr{x) Tc(x, y) = x^Cy - fc(y). 

We will use Vx to denote two player penalty games with Ap-Lipschitz penalty functions. 
A special class of penalty games is when fr(x) = x^x and fc(y) = y^y- We call these 
games as inner product penalty games. 

Two Player Biased Games. This is a subclass of penalty games, where extra con¬ 
straints are added to the penalty functions fr.(x) and fc(y) of the players. In this class 
of games there is a base strategy and for each player and the penalty they receive is 






increasing with the distance between the strategy they choose and their base strategy. 
Formally, the row player has a base strategy p s A", the column player has a base 
strategy q and their strictly increasing penalty functions are defined as fr(||x — p||f) 
and fc(||y - q|lL) respectively. 

Two Player Distance Biased Games. This is a special class of biased games 
where the penalty function is a fraction of the distance between the base strategy 
of the player and his chosen strategy. Formally, a two player distance biased game B 
is defined by a tuple (i?, C, br(x, p), bc(y, q), dr, dc), where {R,C) is a bimatrix game, 
p G A" is a base strategy for the row player, q G A" is a base strategy for the col¬ 
umn player, br(x, p) = ||x — p||f and bc(y,q) = ||y — q||m are penalty functions for the 
row and the column player respectively. The utilities for the players under a strategy 
profile (x, y), denoted by Tr{'x., y) and T'c(x, y), are given by 

Tr(x, y) = x^i?y - dr • br(x, p) rr(x, y) = x^Cy - dc • bc(y, q) 

where dr and dc are non negative constants. 

Solution Concepts. The standard solution concept in game theory is the notion of 
equilibrium. A strategy profile is an equilibrium if no player can increase his utility 
by unilaterally changing his strategy. A relaxed version of this concept is the approx¬ 
imate equilibrium, or e-equilibrium. Intuitively, a strategy profile is an e-equilibrium 
if no player can increase his utility more than e by unilaterally changing his strategy. 
Formally, a strategy profile x is an e-equilibrium in a game £ if for every player i G [M] 
it holds that 


- e for all a;' G Si. 

In MChen et al.l[200^ it was proven that, unless P = PPAD, there is no FPTAS for 
computing an e-NE in bimatrix games. The same result holds for the class of penalty 
games where the penalty functions f for the players depend on n, the size of the under¬ 
lying bimatrix game, and lim„_,.oo f = 0 for every player. Let V’ to denote this class of 
games. 

Theorem 2.2. Unless P = PPAD, there is no FPTAS for computing an e-equilibrium 
in penalty games in V. 

Proof. For the sake of contradiction suppose that there is an FPTAS for computing 
an e-equilibrium for penalty games in V. Then given an n x n bimatrix game (i?, C), 
define the penalty game (i?, C, fr(x), fc(y)) from the family V where lim„_>oo fr(x) = 
0 and lim„_>oo fc(y) = 0. Let (x*,y*) be an e-equilibrium for the penalty game. This 
means that for all x' G A” it holds that x*^i?y* — fr(x*) > x'^i?y* — fr(x') — e or, 
equivalently, x*^i?y* > x'^i?y* — e', where e' = e -P fr.(x*) — fr(x'). Similarly, x*^(7y* > 
x*^Cy' — e", where e" = e -P fc(y*) — fr(y0- But e' = e" = e when n —>• oo. Hence 
(x*, y*) is a e-NE for the bimatrix game {R, C). This means that if there is an FPTAS 
for computing an e-equilibrium in a penalty game in V' then there is an FPTAS for 
computing an e-NE in {R, C) which is a contradiction, unless P = PPAD. □ 

3. APPROXIMATE EQUILIBRIA IN Ap-LIPSCHITZ GAMES 

In this section, we give an algorithm for computing approximate equilibria in Ap Lips- 
chitz games. Note that, our definition of a Ap-Lipschitz game does not guarantee that 
an equilibrium always exists. Our technique can be applied irrespective of whether an 
exact equilibrium exists. If an exact equilibrium does exist, then our technique will 
always find an e-equilibrium. If an exact equilibrium does not exist, then our then our 




algorithm either finds an e-equilibrium or reports that the game does not have an exact 
equilibrium. 

We will utilize the following theorem that was recently proved in lBarma^ Il2015n . 

Theorem 3.1 ( H B arm AnII 2 01 5ll 1. Given a set of vectors X = {xi,X 2 , ■ ■ ■ ,Xn} C 
let conv{X) denote the convex hull of X. Furthermore, let 7 := max^^gx ||a;||p for some 

2 < p < 00 . For every e > 0 and every p G conv{X), there exists an uniform vector 
fi' G conv{X) such that \\p — p'\\p < e. 

If we combine the Theorem IS.ll with the Definition |2]T] we get the following lemma. 

Lemma 3.2. Let X = {xi,X 2 , ■ ■ ■ ,Xn} C let f : conv{X) — 7 > M 6 e a Xp-Lipschitz 

continuous function for some 2 < p < 00 , let e > 0 and let k = where 7 := 

maxa;gx || 2 :||p- Furthermore, let /(x*) be the optimum value of f. Then we can compute 
a k-uniformpoint x' G conv{X) in time 0{n^), such that |/(x*) — /(x')| < e. 

Proof. From Theorem 13.11 we know that for the chosen value of k there exists a 
fc-uniform point x' such that ||x' — x*||p < e/A. Since the function /(x) is Ap-Lipschitz 
continuous, we get that |/(x') — /(x*)| < e. In order to compute this point we have to 
exhaustively evaluate the function / in all Ic-uniform points and choose the point that 
it maximizes/minimizes its value. Since there are = Oipf) possible fc-uniform 

points, the theorem follows. □ 

We now prove our result about Lipschitz games. In what follows we will study a Xp- 
Lipschitz game £ := (M, n, A,p, 7 , T). Assuming the existence of an exact Nash equilib¬ 
rium, we establish the existence of a fc-uniform approximate equilibrium in the game 
£, where fc depends on M, X,p and 7 . Note that A depends heavily on p and the utility 
functions for the players. 

Since by the definition of Ap-Lipschitz games the strategy space Si for every player 
i is the convex hull of n vectors yi,..., in any Xi G Si can be written as a con¬ 
vex combination of y^s. Hence, Xi = YTj=i where aj > 0 for every j G [n] and 

= 1- Then, a = {ai,... ,an) is a probability distribution over the vectors 
yi,..., y„, i.e. vector pj is drawn with probability aj. Thus, we can sample a strategy 
Xi by the probability distribution a. 

So, let X* be an equilibrium for £ and let x' be a sampled uniform strategy profile 
from X*. For each player i we define the following events 



( 1 ) 

( 2 ) 


(3) 


Notice that if all the events tt^ occur at the same time, then the sampled profile x' is 
an e-equilibrium. We will show that if for a player i the events 4>i and ipj hold, then 
the event tt^ has to be true too. 

Lemma 3.3. For all i G [M] it holds that PljgfM] '^3 

Proof. Suppose that both events (()i and p]^ i/'jG [m] hold. We will show that the event 
TTi must be true too. Let Xi be an arbitrary strategy, let xl^ be a strategy profile for the 
rest of the players, and let x'Lj be a sampled strategy profile from xl^. Since we assume 










that the events ijjj is true for all j we get ||x'_ - xlJlp < \W 3 - x*\\p we get that 

||x'_i - xlJlp < ^ \\Xj - X*\\p 

< 'V —^ 

- ^ 2M\ 

e 

< — . 

- 2A 

Furthermore, since hy assumption the utility functions for the players are Ap-Lipschitz 
continuous we have that 

|ii(xi,x'_j - r,(xi,xij| < 

This means that 

T^{xi,:x!_i) < r,(r*,xlj + | 

<r,(a;:,x*_,) + | (4) 

since Ti(x*,xlj) > Ti(xi,xlj) for all possible x^ the strategy profile (a;*,xlj) is an 
equilibrium of the game. Furthermore, since by assumption the event (j)i is true we get 
that 

T,«,x*_J<T,(r',x'_,) + |. (5) 

Hence, if we combine the inequalities (SI) and ([5]) we get that Ti{xt,x'_^) < Ti{x^, x'_j) + e 
for all possible Xi. Thus, if the events and for every j e [M] hold, then the event 
TTj holds too. □ 

We are ready to prove the main result of the section. 

Theorem 3.4. In any game \p-Lipschitz game £ that posses an equilibrium and 

any e > 0, there is a k-uniform strategy profile, with k = that is an e- 

equilibrium. 

Proof. In order to prove the claim, it suffices to show that there is a strategy profile 
where every player plays a Ic-uniform strategy, for the chosen value of k, such that 
the events Tr^ hold for all i G [M]. Since the utility functions in £ are Ap-Lipschitz 
continuous it holds that Hieln] - Hieln] Furthermore, combining that with the 
Lemma [3]3] we get that PliGH V'i ^ PliGH Thus, if the event V'* is true for every 
i € [n], then the event HigH well. 

From the Theorem IS.ll we get that for each i G [M] there is a .uniform point 

x[ such that the event V'l occurs with positive probability. The claim follows. □ 

Theorem 13.41 establishes the existence of a A:-uniform approximate equilibrium, but 
this does not immediately give us our approximation algorithm. The obvious approach 
is to perform a brute force check of all /c-uniform strategies, and then output the one 
the provides the best approximation. There is a problem with this, however, since com¬ 
puting the quality of approximation requires us to compute the regret for each player, 
which in turn requires us to compute a best response for each player. Computing an 
exact best response in a Lipschitz game is a hard problem in general, since we make 
no assumptions about the utility functions of the players. Fortunately, it is sufficient 





to instead compute an approximate best response for each player, a nd Le mma l3.2l can 
be used to do this. The following Lemma is a consequence of Lemma l3.2l 

Lemma 3.5. Let x be a strategy profile for a Xp-Lipschitz game £, and let Xi be a 

best response for the player i against the profile x_i. There is a -uniform strategy 

x\ that is an e-best response against x_i, i.e. |Ti(xi, x_i) — Ti(x', x_i)| < e. 

Our goal is to approximate the approximation guarantee for a given strategy profile. 
More formally, given a strategy profile x that is an e-equilibrium, and a constant ^ > 0, 
we want an algorithm that outputs a number within the range [e — <5, e -I- <5]. Lemma [331 
allows us to do this. For a given strategy profile x, we first compute ^-approximate 
best responses for each player, then we can use these to compute ^-approximate re¬ 
grets for each player. The maximum over the ^-approximate regrets then gives us an 
approximation e with a tolerance of 5. This is formalised in the following algorithm. 



Utilising the above algorithm, we can now produce an algorithm to find an approx¬ 
imate equilibrium in Lipschitz games. The algo rithm checks all fc-uniform strategy 
profiles, using the value of k given by Theorem 13.41 and for each one, computes an 
approximation of the quality approximation using the algorithm given above. 



If the algorithm returns a strategy profile x, then it must be a 3e equilibrium. This 
is because we check that an e-approximation of a(x) is less than 2e, and therefore 
afx) < 3e. Secondly, we argue that if the game has an exact Nash equilibrium, t hen 
this procedure will always output a 3e-approximate equilibrium. From Theorem 13.41 
we know that if fc > Mp-f ^ there is a /c-uniform strategy profile x that is an 
e-equilibrium for £. When we apply our approximate regret algorithm to x, to find an 













e-approximation of a(x), the algorithm will return a number that is less than 2e, hence 
X will be returned by the algorithm. 

To analyse the running time, observe that there are = 0{n^) possible k- 

uniform strategies for each player, thus fc-uniform strategy profiles. Further¬ 
more, our regret approximation algorithm runs in time 0{Mn’‘), where I = . 

Hence, we get the next theorem. 

Theorem 3.6. Given a \p-Lipschitz game £ that posses an equilibrium and any 
e > 0, a 3e-equilibrium can be computed in time O where k = ) 

and I — 0( ^ ). 

Notice that in might be computationally hard to decide whether a game posses an 
equilibrium or not. Nevertheless, our algorithm can be applied in any Ap-Lipschitz 
game, without being affected by the existence or not of an exact equilibrium. If the 
game does not posses an exact equilibrium then our algorithm either finds an approx¬ 
imate equilibrium or decides that there is no fc-uniform strategy profile that is an 
e-equilibrium for the game, thus the game does not posses an exact equilibrium. 

Theorem 3.7. For any game Xp-Lipschitz game £ in time O we can 

either compute a 3e-equilibrium, or decide that £ does not posses an exact equilibrium, 
where k = ) and I = 0( ^ kf ). 

4. A QUASI-POLYNOMIAL ALGORITHM FOR PENALTY GAMES 

In this section we present an algorithm that, for any e > 0, can compute an e- 
equilibrium for any penalty game in V\ in quasi-polynomial time. For the algorithm, 
we take the same approach as we did in the previous section for Lipschitz games: We 
show that if an exact equilibrium exists, then a fc-uniform approximate equilibrium al¬ 
ways exists too, and provide a brute-force search algorithm for finding it. Once again, 
since best response computation may be hard for this class of games, we must provide 
an approximation algorithm for finding the quality of an approximate equilibrium. The 
majority of this section is dedicated to proving an appropriate bound for k, to ensure 
that fc-uniform approximate equilibria always exist. 

We first focus on penalty games that posses an exact equilibrium. So, let (x*,y*) be 
an equilibrium of the game and let (x', y') be a fc-uniform strategy profile sampled from 
this equilibrium. We define the following four events: 

0,={|T,(x',y')-r,(x*,y*)| <e/2} 

TTr ={7i'(x, y') < Tr(x', y') -h e} for all x 
</),={|T,(x',y')-7A(x*,y*)| <e/2} 

TTc ={rc(x', y) < Tc{x.', y') e} for all y. 

The goal is to derive a value for k such that all the four events above are true, or 
equivalently Pr((/)r fl tt^ fl t/fc H tt^) > 0 . 

Note that in order to prove that (x', y') is an e-equilibrium w e on ly have to consider 
the events and ttc. Nevertheless, as we show in the Lemma [4.11 the events cjir and 
(^c are crucial in our analysis. The proof of the main theorem boils down to the the 
events 4>r and (j)c. Furthermore, proving that there is a fc-uniform profile (x',y') that 
fulfills the events 4>r and (j)c too, proves that the approximate equilibrium we compute 
approximates the utilities the players receive under an exact equilibrium too. 







In what follows we will focus only on the row player, since similar analysis can he 
applied for the column player too. Firstly we study the event and we show how we 
can relate it with the event (j)r. 

Lemma 4.1. For all penalty games it holds that Pr(7r°) < n ■ e~~ + Pr{(j)‘^). 
Proof. We begin hy introducing the following auxiliary events for all i £ [n] 

ll’ri = {Rij' < RiY* + |}. 

We prove how the events V'ri and the event 4>r are related with the event tt^. Assume 
that the event <t)r and the events i/’rz for all i G [n] are true . Let x be any mixed strategy 
for the row player. Since by assumption P^y' < P^y* + | and since x is a probability 
distribution, it holds that x^Py' < x^Py* + |. If we subtract fri'x.) from each side we 
get that x^Py' — fr(x) < x’^Py* — fr(x) + |. This means that Tr{x, y') < Tr(x, y*) + | 
for all X. But we know that Tr(x, y*) < Tr(x*,y*) for all x G A", since (x*,y*) is an 
equilibrium. Thus, we get that Tr(x,y') < P.(x*,y*) + I for all possible x. Furthermore, 
since the event 4>r is true too, we get that Tr(x, y') < Tr(x', y') + e. Thus, if the events 
4>r and 'ipri for all i G [n] are true, then the event tt^ must be true as well. Formally, 
riiGH ri C TTr. Thus, Pr(7r“) < Pr{(j)^) + ^■ i/'r-i- Using the Hoeffding bound, we get 

that Pr{4>ri) < e ~ for all i G [n]. Our claim follows. □ 

With Lemma [4.II in hand, we can see that in order to compute a value for k it is suf¬ 
ficient to study the event • We introduce the following auxiliary events that we will 
study seperately: 

(t>ru = {|x'^Py' - x’^^Py*! < e/4} 

(j)rb = {Ifr(x') - fr(x*)| < e/4}. 

It is easy to see that if both i/rb and (/r«are true , then the event (t)r must be true too, 
formally i/rb H i/ru ^ 4>r- Using the analysis from jLinton et al.ll2003l1 we can prove that 

Pr{cj)^^) < 2e Thus, it remains to study the the event (/“[,. 

Lemma 4.2. 


Proof. Since we assume that the penalty function fr.(x') is Ap-Lipschitz continuous 
the event (j)rb can be replaced by the event <j)rb’ = { ||x' — x* J|n < e/4A}. It i s easy to see 
that (j)rb C Then, using the proof of Theorem 2 from llBarmanll2015n we get that 
P[||x' — x*||p] < Thus, using Markov’s inequality we get that 


□ 


Pr(||x' 


II ^ e ^ U[||x'-x*||p] 
lip - .1 \ - 


4A’ 


4A 


8 AVp 

e'/k 


We are ready to prove our theorem 

Theorem 4.3. For any equilibrium (x*,y*) of a penalty game from the class Vx, 
any e > 0, and any k G there exists a k-uniform strategy profile (x', y') that: 

(1) (x', y') is an e-equilibrium for the game, 

(2) |7;(x',y')-r,(x*,y*)| <e/2, 











(3) |T,(x',y')-Te(x*,y*)| <e/2. 


Proof. Let us define the event GOOD = n 0c n tt^ n ttc. In order to prove our 
theorem it suffices to prove that Pr{GOOD) > 0. Notice that for the events 4>c and ttc 
we can use the same a naly sis as for 0^ and ttc and get the same bounds. 

Thus, using Lemma [4. H and the analysis for the events 0c« and 0cb we get that 

Pr{GOOD^) < Pr{(t)'i.) + Pr«) + Pr(0^) + Pr«) 

<2(Pr(0^) + PrK)) 

< 2(2Pr(0^) + n • e“^) (from Lemma iLll l 

< 2(2Pr(0:;J + 2Pr(0^,,) + n ■ e"^) 

< 2(4e“^ + + u • e“^) (from Lemma |4j2) 

eyfc 

< 1 for the chosen value of k. 

Thus, Pr{GOOD) > 0 and our claim follows. □ 

The Theorem l4.3l estabhshes the existence of a fc-uniform strategy profile (x', y') that 
is an e-equilibrium. However, as with the previous section, we must provide an efficient 
method for approximating the quality of approximation provided by a given strategy 
profile. To do so, we first give the following lemma, which shows that approximate best 
responses can be computed in quasi-polynomial time for penalty games. 

Lemma 4.4. Let (x, y) be a strategy profile for a penalty game V\, and let i be a 

best response against y. There is an l-uniform strategy x', with I = that is an 

e-best response against y, i.e. Tr(x, y) < Tr{:x.f y) + e. 

Proof. We will prove that \Tr{i.,y) — Tr(x',y)| < e which implies our claim. Let 
01 = {|x^Py — x'’^Py| < e/2} and 02 = {|fr(x) — fr(x')| < e/2} Notice that Lemma [4.21 
does not use anywhere the fact that x* is an equilibrium strategy, thus it holds 
even if x* is replaced by x. Thus, Pr(02) < Furthermore, using the analysis 

from iLipton et al.ll20()^ again, we ca n pro ve that Pr{(p1) < 2e~^ and using similar 
arguments as in the proof of Theorem 14.31 it can be easily proved that for the chosen 
of I it holds that PrQ>f) -t Pr(02) < 1, thus the events 0i and 02 occur with positive 
probability and our claim follows. □ 

Having given this Lemma, we can reuse Algorithm [T1 but with I set equal to 
to provide an algorithm that aproximates the quality of approximation of a given strat¬ 
egy profile. Then, we can reuse Algorithm |2] with k = to provide a quasi¬ 

polynomial time algorithm that finds approximate equilibia in penalty games. Notice 
again that our algorithm can be applied in games that it is computationally hard to 
verify whether an exact equilibrium exists. Our algorithm either will compute an ap¬ 
proximate equilibrium or it will fail to find one, thus it will decide that the game does 
not posses an exact equilibrium. 

Theorem 4.5. In any penalty game V\ with constant number of players and any 
e> Q, in quasi polynomial time we can either compute a He-equilibrium, or decide that 
Vx does not posses an exact equilibrium. 









5. DISTANCE BIASED GAMES 

In this section, we focus on three particular classes of distance biased games, and we 
provide polynomial-time approximation algorithms for these games. We focus on the 
following three penalty functions: 

— Li penalty: br(x,p) = ||x- p||i = |xj - p»|. 

— Ll penalty: b,.(x,p) = ||x-p||| = - Pi)^- 

— Loo penalty: br(x,p) = ||x - p||oo = max* |xj - pi|. 

Our approach is to follow the well-known technique of iDaskalakis et aTI l2009fl that 
finds a 0.5-NE in a himatrix game. The algorithm that we will use for all three penalty 
functions is given helow. 



While this is a well-known technique for himatrix games, note that it cannot im¬ 
mediately he applied to penalty games. This is because the algorithm requires us to 
compute two best response strategies, and while computing a best-response is trivial 
in himatrix games, this is not the case for penalty games. Best responses for Li and 
Loo penalties can be computed in polynomial-time via linear programming, and for 
Ll penalties, the ellipsoid algorithm can be applied. However, these methods do not 
provide strongly polynomial algorithms. 

In this section, for each of the penalties, we develop a simple combinatorial algorithm 
for computing best response strategies for each of these penalties. Our algorithms are 
strongly polynomial. Then, we determine the quality of the approximation given by the 
base algorithm when our best response techniques are used. In what follows we make 
the common assumption that the payoffs of the underlying himatrix game {R, C) are 
in [0,1]. 

5.1. A 2/3-approximation algorithm for Li-biased games 

We start by considering Li-biased games. Suppose that we want to compute a best- 
response for the row player against a fixed strategy y of the column player. We will 
show that best response strategies in Li-biased games have a very particular form: if 
b is the best response strategy in the (unbiased) himatrix game [R, C), then the best- 
response places all of its probability on b except for a certain set of rows S where it is 
too costly to shift probability away from p. The rows i & S will be played with to 
avoid taking the penalty for deviating. 

The characterisation for whether it is too expensive to shift away from p is given by 
the following lemma. 

Lemma 5.1. Let j be a pure strategy, let k be a pure strategy with p^ > 0, and let 
X be a strategy with x^ = p^. The utility for the row player increases when we shift 
probability from k to j if and only if Rjy — RkY — ‘^dr > 0. 

Proof. Suppose that we shift <5 probability from k to j, where <5 G (0, p^]. Then the 
utility for the row player is equal to Tr{x, y) + S-{Rjy — Rky — 2dr), where the final term 









is the penalty for shifting away from k. Thus, the utility for the row player increases 
under this shift if and only if Rj-y — R^y — 2dr >0. □ 

Observe that, if we are able to shift probability away from a strategy k, then we should 
obviously shift it to a best response strategy for the (unbiased) bimatrix game, since 
this strategy maximizes the increase in our payoff Hence, our characterisation of best 
response strategies is correct. This gives us the following simple algorithm for comput¬ 
ing best responses. 



Our characterisation has a number of consequences. Firstly, it can be seen that if 
dr > 1 / 2 , then there is no profitable shift of probability between any two pure strate¬ 
gies, since 0 < Riy < 1 for all i G [n] . Thus, we get the following corollary. 

Corollary 5.2. If dr > l/2, then p is a dominant strategy. 

Moreover, since we can compute a best response in polynomial time we get the next 
theorem. 

Theorem 5.3. In biased games with Li penalty functions and max{dr,dc} > 1/2, 
an equilibrium can be computed in polynomial time. 

Finally, using the characterization of best responses we can see that there is a con¬ 
nection between the equilibria of the distance biased game and the well supported 
Nash equilibria (WSNE) of the underlying bimatrix game. 

Theorem 5.4. Let B = (i?, C, b,.(x,p), bc(y,q),dr,dc) be a distance biased game 
with Lx penalties and let d := max{dr,dc}- Any equilirbium ofB is a 2d-WSNE for the 
bimatrix game {R,C). 

Proof. Let (x*, y*) be an equilibrium for B. From the best response Algorithm for 
Lx penalty games we can see that X* > Oifandonlyifi?hy*— 2fir < 0, wherebis 
a pure best response against y*. This means that for every i G [n] with x* > 0, it holds 
that Ri-y* > maxjg[„] Rj ■ y* — 2d. Similarly, it holds that Cf ■ x* > maxjg[„] Cj -x* — 2d 
for all i G [n] with y* > 0. This is the definition of a 2d-WSNE for the bimatrix game 
□ 

5.1.1. Approximation algorithm. We now analyse the approximation guarantee provided 
by the base algorithm for Li-biased games. So, let (x*, y*) be the strategy profile the is 
returned by the base algorithm. Since we have already shown that exact Nash equilib¬ 
ria can be found in games with either dc > 112 or dr > 1 / 2 , we will assume that both 
dc and dr are less than 1 / 2 , since this is the only interesting case. 

We start by considering the regret of the row player. The following lemma will be 
used in the analysis of all three of our approximation algorithms. 

Lemma 5.5. Under the strategy profile (x*,y*) the regret for the row player is at 
most 6. 





Proof. Notice that for all i G [n] we have 

|i5pi + (1 - (5)xi - pi| = (1 - (5)|xj - pi|, 

hence ||x* — p||i = (1 —5)||x —p||i and ||x* — p||oo = (1 — i5)||x —p||oo. Furthermore, notice 
that ((1 - = (1 - ^)^l|x - p||i, thus ||x* - p||^ < (1 - ^)||x - p|||. 

Hence the payoff for the row player it holds Tr{x*,y*) > S ■ Tr{p, y*) + (1 — <5) • Tr{x, y*) 
and his regret under the strategy profile (x*, y*) is 

^’’(x*,y*) = maxT^(x,y*) - Trix*,y*) 

X 

= T'r(x, y*) — Tr(x*,y*) (since x is a best response against y*) 

< - r,.(p,y*)) 

< S (since maxTr,(x, y*) < 1 and Tr(p,y*) > 0). 

X 

□ 

Next, we consider the regret of the column player. The following lemma will be used 
for both the Li case and the Loo case. Observe that in the Li case, the precondition of 
dc ■ &c(y*, q) < 1 always holds, since we have ||y* — q||i < 2 , thus dc ■ bc(y*, q) < 1 since 
we are only interested in the case where dc < 1 / 2 . 

Lemma 5.6. If dc ■ bc(y*,q) < L then under strategy profile (x*,y*) the column 
player suffers at most 2 — 25 regret. 

Proof. The regret of the column player under the strategy profile (x*, y*) is 

n^{-x*,y*) = maxrc(x*,y) - Tc(x*,y*) 
y 

= max|(l - (5)ro(x,y) +(5ro(p,y))| - (1 - i5)Tc(x,y*) - 5Tc{p,y*) 

< (1 — (i)(maxTc(x*,y) — Tc(x, y*))(since y* is a best response against p) 

< (l-(i)(l + dc- Eic(y*,q)) (since max To(x*,y) < 1 ) 

X 

< (1 - (5) • 2 (since 4 • bc(y*, q) < D- 

□ 

To complete the analysis, we must select a value for 5 that equalises the two regrets. 
It can easily be verified that setting (5 = 2/3 ensures that 6 = 2 — 25, and so we have 
the following theorem. 

Theorem 5.7. In biased games with Li penalties a 2 j3-equilibrium can be com¬ 
puted in polynomial time. 

5.2. A 5/7-approximation algorithm for Li-biased games 

We now turn our attention to biased games with an Li penalty. Again, we start by 
giving a combinatorial algorithm for finding a best response. Throughout this section, 
we fix y as a column player strategy, and we will show how to compute a best response 
for the row player. 

Best responses in Li-biased games can be found by solving a quadratic program, 
and ac tually this particul ar quadratic program can be solved via the ellipsoid algo¬ 
rithm IlKozlov et al.lll980ll . We will give a simple combinatorial algorithm that uses 
the Karush-Kuhn-Tucker (KKT) conditions, and produces a closed formula for the so¬ 
lution. Hence, we will obtain a strongly polynomial time algorithm for finding best 
responses. 




Our algorithm can be applied on L| penalty functions and any value dr, but for 
notation simplicity we describe our method for dr = 1. Furthermore, we define Ui := 
R^y + 2pi and we call as the payoff of pure strategy i. Then, the utility for the row 
player can be written as Tr(x, y) = x* • Ui — ~ P^P- Notice that the term 

p^p is a constant and it does not affect the solution of the best response; so we can 
exclude it from our computations. Thus, a best response for the row player against 
strategy y is the solution of the following quadratic program 


maximize 


Xj ■ g, - : 


2=1 


2=1 


subject to Xi = 1 


Xi > 0 for all i G [n]. 

The Lagrangian function for this problem is 


n n 

£(x, y, A, u) = yy Xi • Ui - yy xf 
2=1 2=1 


and the corresponding KKT conditions 


n n 

A(yy Xi - 1) - yy uiXj 
2=1 2=1 


ai — X — 2xi — Uj = 0 for all i G [n] (6) 

n 

= 1 (7) 

i=l 

Xi > 0 for all i G [n] (8) 

Xi • Ui = 0 for all i G [n], (9) 


Constraints ©-([Sll are the stationarity conditions and ([9ll are the complementarity 
slackness conditions. We say that strategy x is a feasible response if it satisfies the KKT 
conditions. The obvious way to compute a best response is by exhaustively checking all 
2 ” possible combinations for the complementarity conditions and choose the feasible 
response that maximizes the utility for a player. Next we prove how we can bypass the 
brute force technique and compute all best responses in polynomial time. 

In what follows, without loss of generality, we assume that ai > ... > That is, 
the pure strategies are ordered according to their payoffs. In the next lemma we prove 
that in every best response, if a player plays pure strategy I with positive probability, 
then he must play every pure strategy k with k < I with positive probability. 


Lemma 5.8. In every best response x* ifx* > 0 then xj > 0 for all k <1. 

Proof. For the sake of contradiction suppose that there is a best response x* and 
a k < I such that x* > 0 and x^ = 0. Let us denote M = k} ~ k} • 

Suppose now that we shift some probability, denoted by S, from pure strategy I to pure 
strategy k. Then his utility is Tr{x*,y) = M + ai ■ {x* — S) — (x* — d)'^ + ak ■ S — <5^, which 

is maximized for S = . Notice that i5 > 0 since ak > ai and x* > 0, thus the 

row player can increase his utility by assigning positive probability to pure strategy k 
which contradicts the fact that x* is a best response. □ 

Lemma [5l8l implies that there are only n possible supports that a best response can 
use. Indeed, we can exploit the KKT conditions to derive, for each candidate support. 



the exact probability that each pure strategy would be played. We derive the proba¬ 
bility as a function of a^s and of the support size. Suppose that the KKT conditions 
produce a feasible response when we set the support to have size k. From condition (HJl 
we get that — A) for all 1 < i < A: and zero else. But we know that Xj = 1. 

Thus we get that ~ ^) = ^ if we solve for A get that A = ^ ■ This 

means that for all i G [fc] we get 


X* = 


1 

2 





( 10 ) 


So, our algorithm does the following. It loops through all n candidate supports for a 
best response. For each one, it uses Equation ( flOl l to determine the probabilities, and 
then checks whether these satisfy the KKT conditions, and thus if this is a feasible 
response. If it is, then it is saved for in a list of feasible responses, otherwise it is 
discarded. After all n possibilities have been checked, the feasible response with the 
highest payoff is then returned. 


Algorithm 5. Best Response Algorithm for penalty 


(1) For i = 1... n 

(a) Set xi > ... > Xi > 0 and x^+i = ... = x„ = 0. 

(b) Check if there is a feasible response under these constraints. 

(c) If so, add it to the list of feasible responses. 

(2) Among the feasible responses choose one with the highest utility. 


5.2.1. Approximation Aigorithm.We now show that the base algorithm gives a 5/7- 
approximation whe n applied to Ll-penalty games. For the row player’s regret, we 
can use Lemma [531 to show that the regret is bounded by 5. However, for the column 
player’s regret, things are more involved. We will show that the regret of the column 
player is at most 2.5 — 2.5(5. That analysis depends on the maximum entry of the base 
strategy q and more specifically on whether maxfe{qfc} < 1/2 or not. 

Lemma 5.9. //'maxfe{qfe} < 1/2, then the regret the column player suffers under 
strategy profile (x*,y*) is at most 2.5 — 2.56. 

Proof. Note that when maxfcjqfc} < 1 / 2 , then be = ||y — pHi < 1-5 for all possible 
y. Then, using the analysis from Lemma [5.61 along with the fact that dc ■ bc(y*, q) < 2 
for penalties, and since by assumption dc = I, the claim follows. □ 

For the case where there is a fc such that q?; > 1/2 a more involved analysis is needed. 
The first goal is to prove that under any strategy y* that is a best response against p 
the pure strategy k is played with positive probability. In order to prove that, first it 
is proven that there is a feasible response against strategy p where pure strategy k is 
played with positive probability. In what follows we denote ai := Cf p -I- 2qi. 

Lemma 5.10. Let q^ > 1/2 for some k g [n]. Then there is a feasible response where 
pure strategy k is played with positive probability. 

Proof. Note that > 1 since by assumption q^ > 1/2. Recall from Equation ( flOl t 
that in a feasible response y it holds that yi = \ 









In order to prove the claim it is sufficient to show that yfe > 0 when in the KKT 
conditions is set y* > 0 for all i G [fc]. Or equivalently, to show that - ^ — = 

l{{k - l)ak + 2 - YZ=i “i) > 


fc-i fe-i 

(fc - l)afe + 2- ^aj>fc + l- ^ (C^x + 2qi) 
i=i i=i 

fc-i 

>fc + l — (A: — 1) — 

> 1 + qfc (since q G A”) 

> 0 . 


(since ak > 1) 


The claim follows. □ 


Next it is proven that the utility of the column player is increasing when he adds 
pure strategies i in his support such that > 1. 


Lemma 5.11. Let y^ and be two feasible responses with support size k and 
fc + 1 respectively, where au+i > 1. Then Tc(x, y''+^) > Tc(x, y'^). 


Proof. Let y'^ he a feasible response with support size k for the column player 

a —2 

against strategy p and let X{k) := ——• Then the utility of the column player 
when he plays can be written as 


n n 

T,(x, y'^) = ^ yf . a, - - q^q 

i=l 

i^l 

(Y+ 

i=l 


1 

4 


E' 


k ■ (A(fc))^ - q'^q. 



The goal now is to prove that T'c(x, — Tc(x, y'^) > 0. By the previous analysis for 
Tc(x, y'") and if A := ch - 2, then 

k~\~ 1 k 

Tc(x,y'=+i) -rc(x,y'=) = {k + 1){X{k + 1))^ -j'^a'^ + k- (A(fc))^ 


i=l 


i=l 


> 

> 

> 



1 


4(fc + l) 
1 

4(fc+ 1) 
1 

4(fc + l) 

0 . 


(^ + n/c+i)^ \ 

fcTi y 

+ ^ “ Q^fe+l “ 2 AQ!fc+l)^ 

(A:a|_i_]^ + — 2Aak+i) 

[k + — 2A) (since 1 < ak+i < 2 and A> k 

{k"^ — 5A: + 8 ) (since A> k — 2) 


2 ) 


□ 

Notice that at > 2pk > 1- Thus, the utility of the feasible response that assigns pos¬ 
itive probability to pure strategy k is strictly greater than the utility of any feasible 
responses that does not assign probability to k. Thus strategy k is always played in a 
best response. Hence, the next lemma follows. 

Lemma 5.12. If there is a k g [n] such that qfc > 1/2, then in every best response y* 
the pure strategy k is played with positive probability. 

Using now Lemma r5.12l we can provide a better bound for the regret the column player 
suffers, since in every best response y* the pure strategy k is played with positive 
probability. 

Lemma 5.13. Let y* be a best response when there is a pure strategy k with > 
1/2. Then the regret for the column player under strategy profile (x*, y*) is bounded by 
2-25. 


Proof. Before we proceed with our analysis we assume without loss of generality 
that k = 1. Recall from the analysis for the Algorithm 1 that the regret for the column 
player is 

7^"(x^y*) < (1 - (5)(max{x^C'y} + 2y^qfc - 2y*"q + y*"y*) 

\ yGA / 

< (l-(5)(l + 2qfe-2y*"q + y*"y*). (11) 

We focus now on the term y*^ y*—2y*^ q. It can be provenQ that y*^ y*—2y*^ q < 1 — 2 qfe. 
Thus, from ( fTTl l we get that 7?.'^(x*, y*) <2 — 25. □ 

Recall now that the regret for the row player is bounded by 5, so if we optimize 
with respect to 5 the regrets are equal for d = 2/3. Thus, the next theorem follows, 
since when the there is a fc with q^ > 1/2 the Algorithm 1 produces a 2/3-equilibrium. 
Hence, combining this with Lemma Td.BI the Theorem [534] follows for 5 = bjl. 


^Appendix IaI 












Theorem 5.14. In biased games with L\ penalties a b/7-equilibrium can be com¬ 
puted in polynomial time. 

5.3. Inner product penalty games 

We observe that we can also tackle the case where the penalty function is the inner 
product of the strategy played, i.e. p = q = 0. For these games, that we call inner 
product penalty games, we replace p as the starting point of the base algorithm with 
the fully mixed strategy x". Hence, for that case x* = d • x" + (1 — d) • x for some 
5 G [0, 1]. In Appendix ?? we prove the next theorem. Again, the regret the row player 
suffers under strategy profile (x*, y*) is bounded by 5. 

Lemma 5.15. When the penalty function is the inner product of the strategy played, 
then the regret for the row player under strategy profile (x*, y*) is bounded by 5. 

Furthermore, using similar analysis as in Lemma [5]6] it can be proven that the regret 
for the column player under strategy profile (x*, y*) is bounded by (1 — 5) (1 + dc • y* y*) • 
For the column player we will distinguish between the cases where dc < 1/2 and dc > 
1/2. For the first case where dc < 1/2 it is easy see that the algorithm produces a 0.6- 
equilibrium. For the other case, when dc > 1/2, first it is proven that there is no pure 
best response. 

Lemma 5.16. If the penalty for the column player is equal to y^y and dc > then 
there is no pure best response against any strategy of the row player. 

Proof. Let Cj to denote the payoff of the column player from his j-th pure strat¬ 
egy against some strategy x played by the row player. For the sake of contradiction, 
assume that there is a pure best response for the column player where, without loss of 
generality, he plays only his first pure strategy. Suppose now that he shifts some prob¬ 
ability to his second strategy, that is he plays the first pure strategy with probability x 
and the second pure strategy with probability 1 — x. The utility for the column player 
under this mixed strategy is a: • Ci -I- (1 — x) • 6*2 — dc • (x^ -t (1 — x^), which is maximized 
for X = Notice that x > 0, which means that the column player can deviate 

from the pure strategy and increase his utility. The claim follows. □ 

With Lemma 15.161 in hand, it can be proven that when dc > 1/2 the column player 
does not play any pure strategy with probability greater than 3/4. 

Lemma 5.17. If dc > 1/2, then in y* no pure strategy is played with probability 
greater than 314. 

Proof. For the sake of contradiction suppose that there is a pure strategy i in y* 
that is played with proba bility greater than 3/4. Furthermore, let k be the support size 
of y*. From Lemma [5. 161 since dc > 1 / 2 , we know that there is no pure best response, 

thus k> 2. Then using Equation ( HOH we get that . If we solve for 

Oj we get that at > > 1 which is a contradiction since when q = 0 it holds that 

Oi = Cfx < 1 . □ 

A direct corollary from Lemma 15.171 is that y*^y* < 5/8. Hence, we can prove the 
following lemma. 

Lemma 5.18. Under strategy profile (x*,y*) the regret for the column player is 
bounded by ^(1 — d). 











Proof. Firstly, note that Tc(x*,y*) = + (1 — 5)x^Cy* — y*^y*. Moreover, 

maxygA{x" Cy — y^y} — T'c(x”, y*) = 0, since y* is a best response against x". Finally, 
notice that 0 < y^y < 1 for all y. Thus, the regret for the column player is 

TVix^y*) = (1 - 5)(max{x^C'y - y^y} - x^Cy* + y*"y*) 

\ yGA / 

<(i-5)(i + ^). 

which matches the claimed result. □ 

If we combine Lemmas 15. 151 and lS. 181 and solve for S we can see that the regrets are 
equal for i5 = ^. Thus, we get the following theorem for biased games where q = 0. 

Theorem 5.19. The strategy profile (x*,y*) is a ^^-equilibrium for biased games 
with q = 0 . 

5.4. A 2/3-approximation for Loo-biased games 

Finally, we turn our attention to the Loo penalty. We start by giving a combinatorial 
algorithm for finding best responses. Similar to the best response Algorithm for the 
Li penalty, the intuition is to start from the base strategy p of the row player and 
shift probability from pure strategies with low payoff to pure strategies with higher 
payoff. This time though, the shifted probability will be distributed between the pure 
strategies with higher payoff. 

Without loss of generality assume that Riy > ... > i?„y, ie., that the strategies 
are ordered according to their payoff in the unbiased bimatrix game. The set of pure 
strategies of the row player can be partitioned into three disjoint sets according to the 
payoff they yield: 


"H := {* e [n] : Riy = Riy} 

M := {i G ([n] \ R) : Riy - Riy - dr < 0} 

C:= {iG [n] : Riy - Rty - dr > 0 }. 

Next we giver an algorithm that computes a best response for Loo penalty. 



Let Pmax ■— P'i Ict 'P .— ^ foi* bost rospons© thc folloW" 

ing lemma holds. 












Lemma 5.20. If C then for any best response x of the row player against strat- 
^Sy y it holds that ||x — p||oo > Pmax- Else p is the best response. 

Proof. Using similar arguments as in Lemma lSTTl it can be proven that if there are 
no pure strategies i and k such that Rky — Rij — dr<0 then any shifting of probability 
decreases the utility of the row player. Thus, the best response of the player is p. On 
the other hand, if there are strategies i and k such that Rky — Riy — dr > 0, then the 
utility of the row player increase if all the probability from strategy i is shifted to pure 
strategy k. The set C contains all these pure strategies. Let j G £ be the pure strategy 
that defines Pmax- Then, all the pmax probability can be shifted from j to the a pure 
strategy in R, i.e. a pure strategy that yields the highest payoff, and strictly increase 
the utility of the player. Thus, the strategy j is played with zero probability and the 
claim follows. □ 

In what follows assume that £ 7 ^ 0, hence pmax > 0. From Lemma 15.201 follows 
that there is a best response where the strategy with the highest payoff is played 
with probability pi + Pmax- Hence, it can be shifted up to Pmax probability from pure 
strategies with lower payoff to each pure strategy with higher payoff, starting from 
the second pure strategy etc. After this shift of probabilities there will be a set of pure 
strategies that where each one is played with probability + Pmax and possibly one 
pure strategy j that is played with probability less or equal to pj. The question is 
whether more probability should be shifted from the low payoff strategies to strategies 
that yield higher payoff. The next lemma establishes that no pure strategy form £ is 
played with positive probability in any best response against y. 

Lemma 5.21. In every best response against strategy y all pure strategies i G £ are 
played with zero probability. 

Proof. Let K denote denote the set of pure strategies that are played with positive 
probability after the hrst shifting of probabilities. Without loss of generality assume 
that each strategy i G AT is played with probability pi + Pmax- Then the utility of the 
player under this strategy is equal to U = + £max) • Riy - dr ■ Pmax- For the 

sake of contradiction, assume that there is one strategy j from £ that belongs to K. 
Suppose that probability S is shifted from the strategy j to the hrst pure strategy. Then 
the utility for the player is equal to 17 + 6 {Riy — R^y — dr) > U, since by dehnition of £ 
i?iy — i?jy — dr > Q. Thus, the utility of the player is increasing if probability is shifted. 
Notice that the analysis holds even if the penalty is Pmax + d instead of Pmax, thus the 
claim follows. □ 

Thus, all the probability V from strategies from £ should be shifted to strategies 
yield higher payoff. The question now is what is the optimal way to distribute that 
probability over the strategies with the higher payoff. Clearly, the same amount of 
probability should be shifted in all strategies in R since it makes the penalty smaller. 
Furthermore, it is easy to see that the maximum amount of probability is shifted to 
strategies in R. Next we prove that it V > Pmax ■ {\R\ + \M\) then V is uniformly 
distributed over the pure strategies in "H U At. 

Proof. Ifp >Pma.^-{\R\R\M\) then there is a best response where the probability 
V is uniformly distributed over the pure strategies in 77 U At. □ 

Proof. Let \R\ + |At| = k and S = V — k -pmax- Let 

_^ g g 

U = ^ ^ (pi Pmax ^^r(Pmax “t“ '^)) 




be the utility when the probability S is distributed uniformly over all pure strategies 
in V. U M. Furthermore, let U' be the utility when (5 > 0 probability is shifted from 
a pure strategy j to the first pure strategy that yields the highest payoff. Then U' = 
U + 6{Riy — Rj-y — dr), but Riy — Rjy — dr <0 since j G'HU M. The claim follows. □ 

Using the previous analysis the correctness of th e alg orithm follows. 

Note that, using similar arguments as in Lemma FS.li the next lemma can be proved. 

Lemma 5.22. If dr > 1, then p is a dominant strategy. 

Furthermore, the combination of Lemma 15.221 with the fact that best responses can 
be computed in polynomial time gives the next theorem. 

Theorem 5.23. In biased games with Lao penalty functions and max{dr,c?c} > L 
an equilibrium can be computed in polynomial time. 

Again we can see that there is a connection between the equilibria of the distance bi¬ 
ased game and the well supported Nash equilibria (WSNE) of the underlying bimatrix 
game. 

Observation l. Let B = (i?, C, br(x, p), bc(y, q),dr,df) be a distance biased game 
with Lao penalties and let d := max{dr, dc}. Any equilirbium of B is a d-WSNE for the 
bimatrix game {R, C). 

5.4.1. Approximation aigorithm. For the quality of ap proximation, we can reuse the re¬ 
sults that we proved for the Li penalty. Lemma [53] applies unchanged. For Lemma [5l6l 
we observe that dc ■ bc(y*, q) < 1 when the penalty bc(y* , q) is the Lao norm, since for 
this case it holds ||y* — q||oo < 1 and it is assumed that dc < 1. Thus, we have the 
following theorem. 

Theorem 5.24. In biased games with Lao penalties a 2 !3-equilibrium can be com¬ 
puted in polynomial time. 

6. CONCLUSIONS 

We have studied games with infinite action spaces, and non-linear payoff functions. 
We have shown that Lipschitz continuity of the payoff function can be exploited to 
provide algorithms that find approximate equilibria. For Lipschitz games, we showed 
that Lipschitz continuity of the payoff function allows us to provide an efficient algo¬ 
rithm for finding approximate equilibria. For penalty games, the Lipschitz continuity 
of the penalty function allows us to provide a QPTAS. Finally, we provided strongly 
polynomial approximation algorithms for Li, L^, and Lao distance biased games. 

Several open questions stem from our paper. The most important one is to under¬ 
stand the exact computational com plex ity of equilibrium computation in Lipschitz and 
penalty games. Although Theorem 12.21 states that there no FPTAS for penalty games, 
the result holds only for games with penalty functions that depend on the size of the 
game and tend to zero as the size grows. Another interesting feature is that we cannot 
verify efficiently in all penalty games whether a given strategy profile is an equilib¬ 
rium, and so it seems questionable whether PPAD can capture the full complexity of 
penalty games. On the other side, for the distance biased games that we studied in 
this paper, we have shown that we can decide in polynomial time if a strategy profile 
is an equilibrium. Is the equilibrium computation problem PPAD-complete for the two 
classes of games we studied? Are there any subclasses of penalty games, e.g. when the 
underlying normal form game is zero sum, that are easy to solve? 




Another obvious direction is to derive better polynomial time approximation guar- 
an tees under for biased games . We believe that the op timization approach used 
bv lTsaknakis and Spirakis] t2008t] and iDeligkas et all ll2015l] might tackle this problem. 
U nder the Li penalties the analysis of the steepest descent algorithm may be similar 
to lDeligkas et al.l l2015l] and therefore we may be able to obtain a constant approxima¬ 
tion guarantee similar to the bound of 0.5 that was established i n that paper. The o ther 
known techniques that compute approximat e Nash equilibria llBosse et al.ll201^ and 
approximate well supported Nash equilibria liCzumai et al.ll2015l:lFearnlev et al.ir2012l: 
Kontogiannis and Spirakial2010tl solve a zero sum bimatrix game in order to derive 
the approximate equilibrium, and there is no obvious way to generalise this approach 
in penalty games. 


REFERENCES 

Yaron Azrieli and Eran Shmaya. 2013. Lipschitz Games. Math. Open Res. 38, 2 (2013), 
350-357. 

Yakov Babichenko. 2013. Best-reply dynamics in large binary-choice anonymous 
games. Games and Economic Behavior 81 (2013), 130-144. 

Siddharth Barman. 2015. Approximating Nash Equilibria and Dense Bipartite Sub¬ 
graphs via an Approximate Version of Caratheodory’s Theorem. In Proc. of STOC 
2015. 361-369. 

H. Bosse, J. Byrka, and E. Markakis. 2010. New algorithms for approximate Nash 
equilibria in bimatrix games. Theoretical Computer Science 411, 1 (2010), 164-173. 

loannis Caragiannis, David Kurokawa, and Ariel D. Procaccia. 2014. Biased Games. 
In Proc. of AAAI 2014. 609-615. 

Gretchen B. Chapman and Eric J. Johnson. 1999. Anchoring, Activation, and the Con¬ 
struction of Values. Organizational Behavior and Human Decision Processes 79, 2 
(1999), 115- 153. 

Xi Chen, Xiaotie Deng, and Shang-Hua Teng. 2009. Settling the complexity of comput¬ 
ing two-player Nash equilibria. J. ACM 56, 3 (2009), 14:1-14:57. 

Xi Chen, David Durfee, and Anthi Orfanou. 2015. On the Complexity of Nash Equilib¬ 
ria in Anonymous Games. In Proc. STOC. 381-390. 

Artur Czumaj, Argyrios Deligkas, Michail Fasoulakis, John Fearnley, Marcin Jurdzin- 
ski, and Rahul Savani. 2015. Distributed Methods for Computing Approximate 
Equilibria. (2015). 

Constantines Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The 
Complexity of Computing a Nash Equilibrium. SIAM J. Comput. 39, 1 (2009), 195- 
259. 

Constantines Daskalakis, Aranyak Mehta, and Christos H. Papadimitriou. 2007. 
Progress in approximate Nash equilibria. In Proc. of EC. 355-358. 

Constantines Daskalakis, Aranyak Mehta, and Christos H. Papadimitriou. 2009. A 
note on approximate Nash equilibria. Theoretical Computer Science 410, 17 (2009), 
1581-1588. 

Constantines Daskalakis and Christos H. Papadimitriou. 2014. Approximate Nash 
equilibria in anonymous games. Journal of Economic Theory (2014). To appear. 

Joyee Deb and Ehud Kalai. 2015. Stability in large Bayesian games with heteroge¬ 
neous players. Journal of Economic Theory 157, C (2015), 1041-1055. 

Argyrios Deligkas, John Fearnley, Rahul Savani, and Paul Spirakis. 2015. Computing 
Approximate Nash Equilibria in Polymatrix Games. In Algorithmica. To appear. 

John Fearnley, Paul W. Goldberg, Rahul Savani, and Troels Bjerre Sprensen. 2012. 
Approximate Well-Supported Nash Equilibria Below Two-Thirds. In SAGT. 108- 
119. 

















Amos Fiat and Christos H. Papadimitriou. 2010. When the Players Are Not Expec¬ 
tation Maximizers. In Algorithmic Game Theory - Third International Symposium, 
SAGT 2010, Athens, Greece, October 18-20, 2010. Proceedings. 1-14. 

Daniel Kahneman. 1992. Reference points, anchors, norms, and mixed feelings. Orga¬ 
nizational Behavior and Human Decision Processes 51, 2 (1992), 296-312. 

Spyros C. Kontogiannis and Paul G. Spirakis. 2010. Well Supported Approximate Equi¬ 
libria in Bimatrix Games. Algorithmica 57, 4 (2010), 653-667. 

M.K. Kozlov, S.P. Tarasov, and L.G. Khachiyan. 1980. The polynomial solvability of 
convex quadratic programming. {USSR} Computational Mathematics and Mathe¬ 
matical Physics 20, 5 (1980), 223 - 228. 

Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. 2003. Playing large 
games using simple strategies. In EC. 36-41. 

Marios Mavronicolas and Buckhard Monien. 2015. The Complexity of Equilibria for 
Risk-Modeling Valuations. CoRR abs/1510.08980 (2015). 

John Nash. 1951. Non-Cooperative Games. The Annals of Mathematics 54, 2 (1951), 
286-295. 

J. B. Rosen. 1965. Existence and Uniqueness of Equilibrium Points for Concave N- 
Person Games. Econometrica 33, 3 (1965), pp. 520-534. 

Haralampos Tsaknakis and Paul G. Spirakis. 2008. An Optimization Approach for 
Approximate Nash Equilibria. Internet Mathematics 5, 4 (2008), 365-382. 

Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics 
and Biases. Science 185, 4157 (1974), 1124-1131. 



A. PROOF THAT Y*^ Y* - 2Y|fQK < 1 - 2Qk. 

Proof. Notice from dlOD that for all i we get yi = yk + — au)- Using that we 

can write the term y^y = ^ - yf as follows for a when y has support size s 

i—1 i^k 

= yfe + ^ (yfc + - afc)^ 

= ■syfc + - ak))yk + 

i^k i^k 

Then we c an see that y*^y — 2y^^qfe is increasing as yl increases, since we know 
from Lemma [5.121 that yj > 0. This becomes clear if we take the partial derivative of 
y*^y* — 2 y^qfc with respect to y^ which is equal to 

^■syfe + ^ + X! (since y^ = y^ + ^{ai - ak)) 

i^k i^k 

= 2 sy*+ 2 ^y*- 2 (s-l)yJ- 2 qfc 

i^k 

s 

= 2 ^y:- 2 qfc 
i^l 

= 2 — 2qfc 

> 0 (since y^ > 0). 

T 

Thus, the value of y* y* — 2 y^qfc is maximized when y^ = 1 and our claim follows. □ 




